Novel methods for coupled prediction of extreme wind speeds and wave heights

Two novel methods are being outlined that, when combined, can be used for spatiotemporal analysis of wind speeds and wave heights, thus contributing to global climate studies. First, the authors provide a unique reliability approach that is especially suited for multi-dimensional structural and environmental dynamic system responses that have been numerically simulated or observed over a substantial time range, yielding representative ergodic time series. Next, this work introduces a novel deconvolution extrapolation technique applicable to a wide range of environmental and engineering applications. Classical reliability approaches cannot cope with dynamic systems with high dimensionality and responses with complicated cross-correlation. The combined study of wind speed and wave height is notoriously difficult, since they comprise a very complex, multi-dimensional, non-linear environmental system. Additionally, global warming is a significant element influencing ocean waves throughout the years. Furthermore, the environmental system reliability method is crucial for structures working in any particular region of interest and facing actual and often harsh weather conditions. This research demonstrates the effectiveness of our approach by applying it to the concurrent prediction of wind speeds and wave heights from NOAA buoys in the North Pacific. This study aims to evaluate the state-of-the-art approach that extracts essential information about the extreme responses from observed time histories.

It is challenging to accurately assess multi-dimensional dynamic system reliability using conventional theoretical reliability methods [1][2][3][4] . Estimating the reliability of a complicated dynamic system is easy if sufficient system response data are available or if direct Monte Carlo simulations are performed 5 . However, experimental or computational costs may be expensive for many engineering dynamic systems with complicated dynamics. Motivated by this, the authors developed a new reliability method for dynamic systems that require significantly less measurement and computing costs.
It has been a challenge to how exactly one predicts the shape and characteristics of waves through wind speed variables, even though this has been started and partially solved in the works of [6][7][8][9][10][11] . Our work, however, deals with the general approach of extreme value theory in which no physical dynamics of the water waves are expected to play a significant role in driving the appearance of rare events, i.e. there is an expected universality of extreme events in a variety of different physical and natural systems [12][13][14][15][16][17] . Although beyond the scope of our work, there Some successful attempts have been to study extreme events in water waves (often called rogue or freak waves) with distributions uniquely determined by the dynamics of the physical system. For instance 18,19 , have shown that water waves departing from linear theory will modify their distribution from a Rayleigh type [20][21][22] to a distribution dependent on the square root of the wave steepness. Similarly 23,24 , have shown that a Rayleigh distribution modified by a polynomial function of the ratio between height and water depth controls extreme events in Hurricane data. In addition, spectrum bandwidth seems to have different types of effects in extreme wave distribution depending on whether they are in deep [25][26][27][28] or shallow water 13 . Furthermore, ocean processes such as shoaling or wave-current systems that drive wave trains out of equilibrium have been experimentally 22,25,[29][30][31][32][33][34][35][36][37] associated with increasing the occurrence of extreme waves by order of magnitude. However, it has been recently found that no established theoretical distribution to date, neither universal as Gumbel nor based on physical principles, can describe extreme wave statistics in a wide range of conditions  .
Note that methods introduced by authors here do not rely on Gumbel (or any other type) distribution type assumption, instead, Gumbel-based extrapolation was used just for comparison.

Method
Consider multi-degree of freedom (MDOF) jointly stationary dynamic environmental system with representative response vector process R(t) = (X(t), Y (t), Z(t), . . .) , that has been either measured or simulated over a sufficiently long time span (0, T) . Unidimensional global maxima over entire time span (0, T) is denoted as By sufficiently long time span T authors mean large enough value of T with respect to the environmental dynamic system auto-correlation time. Let X 1 , . . . , X N X be consequent temporal local maxima of the environmental process X(t) at discrete monotonously temporally increasing times t X 1 < . . . < t X N X within (0, T) . Similar definition follows for other MDOF environmental response components Y (t), Z(t), . . . with Y 1 , . . . , Y N Y ; Z 1 , . . . , Z N Z and so on. For simplicity, all R(t) components, maxima are assumed to be non-negative. The target now is to accurately estimate environmental system failure probability, namely its probability of exceedance . . being probability of non-exceedance for critical values of environmental response components η X , η Y , η Z ,…; ∪ denotes logical unity operation «or»; p X max T ,Y max T ,Z max T ,... being joint probability density of the global maxima over the entire time span (0, T).In practice, however, it is not always feasible to estimate directly p X max .. joint probability distribution due to its high dimensionality and limitations of the available data set. The moment when either X(t) exceeds η X , or Y (t) exceeds η Y , or Z(t) exceeds η Z , and so on, the environmental dynamic system being regarded as failed. Fixed failure levels η X , η Y , η Z ,… being individual for each unidimensional response . . < t Z N Z recorded in temporally non-decreasing order into one single synthetic merged time vector t 1 ≤ . . . ≤ t N . Note that then . In this case t j represents local maxima of one of MDOF environmental dynamic structural response components either X(t) or Y (t) , or Z(t) and so on. The latter means that having R(t) time record, one only needs simultaneously and continuously screen for the system unidimensional response component local maxima and record its exceedance of MDOF limit vector (η X , η Y , η Z , ...) in any of dynamic system components X, Y , Z, . . . Local unidimensional response component maxima then merged into one synthetic temporal non-decreasing vector � That is to say each local maxima R j is in fact actual encountered local maxima corresponding to either X(t) or Y (t) , or Z(t) and so on. Finally the unified limit vector (η 1 , . . . , η N ) is introduced with each component η j is either η X , η Y or η Z and so on, depending which of X(t) or Y (t) , or Z(t) etc. corresponding to the current local maxima with running index j.
Scaling parameter 0 < ≤ 1 being now introduced to artificially simultaneously decrease limit values for all system response components, i.e. new MDOF limit vector η X , η Y , η z , ... with η X ≡ · η X , ≡ · η Y , η z ≡ · η Z , … being introduced. The unified limit vector η 1 , . . . , η N being introduced with each component η j being equal to either η X , η Y or η z and so on The latter automatically defines the target probability P( ) as a function of , note that P ≡ P(1) from Eq. (1). Non-exceedance probability P( ) can be now estimated as follows In practice the dependence between neighbouring R j is not always negligible, thus following one-step (will be called conditioning level k = 1 ) memory approximation being introduced for 2 ≤ j ≤ N (will be called conditioning level k = 2 ). Approximation introduced by Eq. (3) may be further expressed as where 3 ≤ j ≤ N (called here conditioning level k = 3 ), and so on 38 . Equation (4) presents series of subsequent refinements of statistical independence assumption. The latter type of approximations models statistical dependence effect between neighbouring maxima with an increased accuracy. Since original MDOF dynamic process R(t) was assumed ergodic and therefore stationary, probability p k ( ) := Prob{R j > η j |R j−1 ≤ η j−1 , R j−k+1 ≤ η j−k+1 } for j ≥ k is independent of j but only dependent on conditioning level k . Thus non-exceedance probability may be now approximated www.nature.com/scientificreports/ Note that Eq. (5) follows from Eq. (1) by neglecting Prob R 1 ≤ η 1 ≈ 1 , as design failure probability must of a small order of magnitude, and N>>k. Equation (5) is similar to mean up-crossing rate equation for probability of exceedance [39][40][41][42][43][44][45][46][47][48][49][50][51] . There is convergence with respect to the conditioning parameter k Note that Eq. (5) for k = 1 turns into classic non-exceedance probability relationship with corresponding mean up-crossing rate function where ν + ( ) denotes the mean up-crossing rate of the response level for the above assembled non-dimensional vector R(t) assembled from scaled MDOF system response X η X , Y η Y , Z η Z , . . . . The mean up-crossing rate is given by the Rice's formula given in Eq. (7) with p RṘ being joint probability density for R,Ṙ with Ṙ being the time derivative R ′ (t) 1 . Equation (7) relies on the Poisson assumption, that is up-crossing events of high levels (in this paper it is ≥ 1 ) may be assumed to be independent.
The proposed methodology may be used for nonstationary cases as well. Given environmental scattered diagram of m = 1, .., M sea states, each environmental short-term sea state having probability q m , so that M m=1 q m = 1 . Next, let one introduce the long-term equation with p k ( , m) being the same function as in Eq. (6), but corresponding to a specific short-term sea state with number m . The above introduced p k ( ) as functions being often regular in the tail, specifically for values of approaching and exceeding 1 . More specifically, for ≥ 0 , the probability distribution tail behaves similar to exp −(a + b) c + d with a, b, c, d being suitably fitted constants for suitable tail cut-on 0 value. Therefore, one can write By plotting ln ln p k ( ) − d k versus ln(a k + b k ) , often nearly perfectly linear tail behaviour being observed. Optimal values of the parameters a k , b k , c k , p k , q k may be determined using a sequential quadratic programming (SQP) method incorporated in the NAG Numerical Library 52 .
Let us consider stationary stochastic process X(t) , being either simulated or measured over a certain time span 0 ≤ t ≤ T , and which is being represented as a sum of two independent identically distributed stationary environmental dynamic processes X 1 (t) and X 2 (t) , namely Denoting probability density function (PDF) of X(t) as p X and PDF of X 1 (t) and X 2 (t) as p X 1 , following convolution equation holds In order to exemplify the latter idea regarding how to estimate unknown probability distribution robustly p X 1 , subsequently improving given empirical distribution p X . In the next, authors briefly discuss common knowledge regarding the discrete convolution of two vectors. The convolution of two vectors, u and v , representing an area of overlap of vector components, as v slides across u . Algebraically, convolution being the same operation as multiplying polynomials whose coefficients are the elements of, u and v . Let m = length(u) and n = length(v) . Then w being the vector of length m + n − 1 , with k-th element being The sum is over all the values of j that lead to legal subscripts for u j and v k − j + 1 , specifically j = max(1, k + 1 − n) : 1 : min(k, m) . When m = n , as will be the main case in this study (5) P k ( ) ≈ exp (−N · p k ( )) , k ≥ 1. www.nature.com/scientificreports/ From Eq. (12) one can also observe that having found u = v = (u(1), .., u(n)) , one can gradually obtain w -components w(n + 1), . . . , w(2n − 1) , as index increases from n + 1 to 2n − 1 . The latter clearly would extend vector w into support domain that is twice longer than the original distribution support domain, i.e. doubling the p X distribution support length (2n − 1) · �x ≈ 2n · �x = 2X L , as compared to the original probability distribution support length n · x = X L with x being constant length of each descrete distribution bin. Note that w = (w(1), . . . , w(n)) being discrete representation of the target empirical distribution p X , and n representing length of probability distribution support [0, X L ] , for simplicity in this paper one is limited to the case of one-sided positive valued random variables, namely X ≥ 0 . As will be further discussed, a simple linear extrapolation of self deconvoluted vector (u(1), . . . , u(n)) towards (u(n + 1), . . . , u(2n − 1)) is suggested, with p X 1 having its tail linearly extrapolated within the range (X L , 2X L ). Using Eq. (11) original vector w will be extended and extrapolated into support domain being twice longer than original distribution support domain, namely doubling p X distribution support length (2n − 1) · �x ≈ 2n · �x = 2X L , as compared to original probability distribution support length n · x = X L . To smoothen original distribution p X (x) tail, authors have performed distribution p X tail interpolation for high tail values x , using Naess-Gaidai (NG) extrapolation method 53 . As has been discussed above, the proposed deconvolution extrapolation technique has an advantage of not presuming any specific extrapolation functional class needed to extrapolate distribution tail.
To validate the above-suggested extrapolation methodology, the «shorter» version of the original data set was used for extrapolation to compare with predictions based on a full «longer» data set.

Results
This section aims to demonstrate the efficiency of the previously described methodology by applying the new method to the National Oceanic and Atmospheric Administration buoy (NOAA) ten-minute average wind speed values and significant wave heights (calculated as the average of the highest one-third of all wave heights during the 20-min sampling period) in the North Pacific region near the Hawaiian Islands. For this study, the NOAA wind and wave measurement location Station 51003 (LLNR 28005.7)-WESTERN HAWAII-205 NM SW of Honolulu was selected, and its measured wind speed and wave height values were set as two environmental system components (dimensions) X, Y, thereby serving as an example of a two-dimensional (2D) environmental system. Unidimensional extreme response values were chosen as the whole analysed data set maximum wind speeds and wave heights respectively during observation period between years 2009-2014 for the chosen buoy in situ measurement location. Figure 1 presents National Oceanic and Atmospheric Administration 54 , buoy locations in North Pacific, blue circle indicates area of interest. Figure 2 shows data buoy containing sensors used to monitor and collect atmospheric and oceanographic conditions, collected data is then converted into an electronic signal and transmitted to shore or logged in the onboard data unit. This 3-m foam buoy has following characteristics: Site elevation: sea level. Air temp height: 3.4 m above site elevation. Anemometer height: 3.8 m above site elevation. Barometer elevation: 2.4 m above mean sea level. Sea temp depth: 2 m below water line. Water depth: 1987 m. In order to unify all two measured time series X, Y the following scaling was performed: making all two responses non-dimensional and having the same failure limit equal to 1. Next, all local maxima from two measured time series were merged into one single time series by keeping them in time non-decreasing order: � R = (max{X 1 , Y 1 }, . . . , max{X N , Y N }) with the whole vector R being sorted in temporally non-decreasing order of occurrence of these local maxima. Figure 3 presents example of non-dimensional assembled vector R , consisting of assembled local maxima of raw daily largest highest daily windcast speed data. The failure probability distribution tail extrapolation was performed towards 100 years return period. Synthetic vector R does not have physical meaning on its own, as it assembled of completely different response components. Index j is just a running index of local maxima encountered in non-decreasing time sequence. Index j being running index of local maxima encountered in non-decreasing time sequence. The «shorter» data record was generated by taking each tenth data point from the «longer» data record.  www.nature.com/scientificreports/ Figure 4 on the left presents the «shorter» data record f X 1 tail, obtained by deconvolution as in Eq. (13), and subsequently linearly extrapolated in the terminal tail section to cover the X 1 range matching the «longer» data record. Figure 4 on the right presents final unscaled results of the proposed in this paper technique, namely the «shorter» decimal log scale f X tail, extrapolated by deconvolution, along with «longer» data distribution tail and NG extrapolation.
It is seen that NG prediction does not agree with deconvolution approach, leading to non-conservative prediction, indicated by star in Fig. 4 on the right. It is seen from Fig. 4 on the right that the proposed method performs quite well, being based on the «shorter» data set, and delivering probability distribution close to the one based on the «longer» data set. Figure 4 on the right presents extrapolation according to Eq. (9) as well novel deconvolution method towards failure state with 100 year return period, which is 1, and somewhat beyond, = 0.05 cut-on value was used. Dotted lines indicate extrapolated 95% confidence interval according to Eq. (10). According to Eq. (5) p( ) is directly related to the target failure probability 1 − P from Eq. (1). Therefore, in agreement with Eq. (5) system failure probability 1 − P ≈ 1 − P k (1) may be now estimated. Note that in Eq. (5) N corresponds to a total number of local maxima in the unified response vector R.Conditioning parameter k = 6 was found to be sufficient due to occurrence of convergence with respect to k , see Eq. (6). The generalization potential of the proposed methods is extension to non-stationary systems with an underlying trend. If the underlying trend would be known, which is rarely the case, it may be subtracted to bring measurements to stationary state. Otherwise proper trend analysis would be needed in combination with advocated here methods. The main model assumptions subsequent limitation is the system quasi-stationarity as mentioned before.

Conclusions
The key advantage of introduced methodology is its ability to assess reliability of non-linear dynamic systems with high dimensionality. This paper studied wind speeds and wave heights measured by National Oceanic and Atmospheric Administration buoys in North Pacific region, during years 2009-2014. Novel environmental reliability method has been applied to predict occurrence of extreme waves within time horizon of 100 years. Theoretical reasoning behind the proposed method has been given in detail.
Methods introduced in this paper, have been previously validated by application to range of dynamic environmental systems, but for only one-dimensional system responses. This study has aimed at further development of robust and simple general purpose multi-dimensional reliability method. Both measured and numerically simulated time series responses can be analyzed. In case of measured system response, as illustrated in this paper, an accurate prediction of environmental system failure probability is possible.
Finally, the suggested methodology can be used in wide range of modern engineering areas of applications. The presented environmental example does not limit applicability area of new method by any means.

Data availability
The datasets analyzed during the current study are available on request. Please contact Prof. Yihan Xing, email: yihan.xing@uis.no. See also 54 .